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Abstract 
A  search  is  conducted  for  a  target  moving  in  discrete  time 
between  a  finite  number  of  cells  according  to  a  known  Markov 
process.   The  set  of  cells  available  for  search  in  a  given  time 
period  is  a  function  of  the  cell  searched  in  the  previous  time 
period.   The  problem  is  formulated  and  solved  as  a  partially 
observable  Markov  decision  process  (POMDP) .   A  finite  time 
horizon  POMDP  solution  technique  is  presented  which  is  simpler 
than  the  standard  linear  programming  methods. 


THE  OPTIMAL  SEARCH  FOR  A  MOVING  TARGET 
WHEN  THE  SEARCH  PATH  IS  CONSTRAINED 

1.   Problem  Statement 

A  discrete  time  search  is  conducted  for  a  target  moving 
between  a  finite  set  of  cells   C  =  {1,...,N}  .   At  the  beginning 
of  each  time  period,  one  cell  is  searched.   If  cell   i   was 
searched  in  the  previous  time  period,  the  current  search  cell 
must  be  selected  from  the  set   C.  £  C  .   If  the  target  is  in  the 
selected  cell   k  ,  it  is  detected  with  probability   qk^[Ofl]  .   If 
the  target  is  not  in  the  cell  searched,  it  can  not  be  detected 
during  the  current  time  period.   After  an  unsuccessful  search, 
a  target  in  cell   i   moves  to  cell   j   with  probability   p. . 
for  the  next  time  period.   The  transition  matrix,   P  =  [p. .]  , 
is  known  to  the  searcher.   The  object  of  the  search  is  to  maxi- 
mize the  T-time  period  probability  of  detection. 


2.   Background 

The  moving  target  problem  has  received  considerable  atten- 
tion, much  of  it  recent.   Washburn  [19  80]  and  Stone  and  Kadane 
[1981]  list  the  important  references.   Pollock  [1970]  solved 
the  problem  addressed  here  for   N  =  2   and   C-,  =  C9  =  C  . 
Washburn  [19  80]  and  Brown  [19  80]  introduced  a  powerful  technique 
giving  exact  solutions  for  the  N-cell  case,  if  all  cells  are 
available  for  search  in  each  time  period  (i.e., 

C.  =  C,  i  =  1,...,N),  search  effort  can  be  inf initesimally  divided 
between  the  cells,  and  the  detection  function  is  exponential. 
Stewart  [19  80]  adapted  this  technique  to  the  search  problem 
considered  here  by  using  branch-and-bound  methods.   As  Stewart 
observed,  however,  the  nonconvexity  of  the  space  of  possible 

search  plans  allows  this  method  to  converge  to  suboptimal  solutions 
Smallwood  and  Sondik  [1973]  and  Monahan  [1982]  noted  that 

the  2-cell  problem  solved  by  Pollock  [1970]  could  be  modelled 
as  a  partially  observable  Markov  decision  process  (POMDP)  and 
that  an  N-cell  extension  was  possible.   This  paper  makes  that 
extension  and,  in  addition,  allows  that  the  set  of  possible 
search  cells  in  a  given  time  period  be  a  function  of  the  search 
cell  selected  in  the  previous  time  period.   This  permits 
searches  to  be  modelled  where  the  searcher  can  travel  only 
a  limited  distance  between  time  periods.   Thus,  the  search  cell 
in   a  given  time  period  must  be  within  some  specified  neighbor- 
hood of  the  search  cell  in   the  previous  time  period. 

Also  reported  on  is  a  finite  time  horizon  POMDP  solution 
technique  which  is  simpler   than  the  standard  linear  program- 
ming techniques  (e.g.,  Monahan  [1982]),  and  which,  initial 
computational  experience  indicates,  is  more  quickly  executed. 


3.   Mathematical  Development 

As  is  standard  for  many  problems  exploiting  a  Markov  as- 
sumption, the  solution  technique  used  here  is  dynamic  program- 
ming.  This  method  requires  that  the  process  being  modelled  be 
defined  in  terms  of  a  sufficient  statistic  (Bersekas  [1976], 

p.  122).  Following  Sondik  [1971],  Smallwood  and  Sondik  [1973] 

N+ 1 
and  Platzman  [1980],  we  use  the  row  vector   (Ti(k),i)£  R     ,  where 

7T  .(k)  =  Pr(the  target  is  in  cell   j   at  the  beginning  of  time 

period   k  ,  given  unsuccessful  search  in  all  previous  time 

periods},  and   i  £  C   is  the  cell  searched  in  the  previous  time 

period.   If  the  dependence  on   k   is  clear  from  context,   Tr(k) 

will  be  written  as   tt  .   The  state  space  then  becomes   n  x  C 

where 

II={7t6RN|tt1    =    1,tt^0}, 

and   1   and   0   can  be  either  vectors  or  scalars.   The  vector 

inequality   a  >_  b   means   a.  _>  b.  ,  Vi  . 

Following  the  dynamic  programming  convention  of  labelling 

"backwards  in  time",  we  define  V  (7T,i)   to  be  the  maximum 

i  n   ' 

obtainable  probability  of  detection  with   n   time  periods  remain- 
ing and  a  current  state  vector  (TT,i).   Let   T.(tt)  d  R   be   tt 
updated  for  unsuccessful  search  in  cell  j  ,  using  Bayes's  rule. 
That  is, 

T.(tt)  =  (l-q.TT.)"1^.  ,  (1) 

where   P.  £  R      is   P   with  row   j   multiplied  by  (1-q.)  . 

If   q  .  tt  .  =  1  ,  then  the  search  in  the  current  time  period  detects 
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the  target  with  certainty,  and  (1)  is  not  defined.   We  can  now 

write   V  (-rr.i)   in  terms  of   V   -,(tt,j)   as  follows: 
n  n— 1 

V  (Ti,i)  =  max  jq.TT   +  (1-q .  tt  .)  V   -.(T  .  (tt)  ,  j)  [  ,  (2) 

jiC±[    J  J        J  J       J 

with   VQ(TT,i)  =0  . 

Equation  (2)  is  the  dynamic  programming  recursion  that 
must  be  solved  in  each  time  period.   It  looks  formidable,  pri- 
marily because   tt   is  real  rather  than  discrete.   We  will 
show,  however,  that  V  (fT,i)   may  be  expressed  in  a  particularly 
simple  form.   Namely, 

V  (iT,i)  =    max     ira  ,  (3) 

a€ A(n, i) 

where   A(n,i)   is  a  finite  collection  of   N-vectors.   The 
dynamic  programming  problem  then  becomes  one  of  constructing 
A(n,i)   from   A(n-l,j). 

If   C-  =  C  ,  Vi  ,  then  the  search  problem  as  formulated 
becomes  a  standard  POMDP  and  can  be  solved  using  the  linear 
programming  methods  of  Sondik  [1971],  Smallwood  and  Sondik  [1973], 
or  Monahan  [1982],   Allowing  that  the  action  selected  in  the 
previous  time  period  can  constrain  the  actions  available  in  the 
present  time  period  requires  an  augmented  state  space 
(II  x  c   vice  n)  and  represents  a  generalization  of  the 
standard  model.   However,  as  the  next  theorem  shows,  the  basic 
form  of  the  POMDP  solution  remains  the  same.   Specifically,  V  (TT,i) 
is  piecewise  linear  and  convex. 


Theorem:   For   n  =  0,...,T  ,  V  (.  tt  ,  i )   is  piecewise  linear  and 
convex  in   tt  .   That  is, 


Vn(-rr,i) 


max     it  a 
at  A(n,  i) 


(4) 


where   A(n,i)   is  a  finite  set  of   N-vectors. 

Proof:   We  proceed  by  induction.   (4)  holds  trivially  for   n  =  0 
and   A(0,i)  =  0  .   For   n  =  1  ,  it  also  holds,  since  from  (2), 


V,  (tt  .  i)  =   max  q  .  tt  . 

1         j€C.  3    3 

J       l 


max     ira  , 
a€A(l,i) 


where   A(l,i)  =  {q.^.|j^C.}   and  E..ZR         is  a  column  vector 
with  a   1   is  the   j th   place  and   O's   elsewhere. 

Now  assume  (4)  holds  in  time  period  (n-1) .   From  (2), 


V    (  tt  ,  i )     =   max    iq.Tr.    +     (l-q.7T.)  max  (T.(Tr)a.) 


n 


J«C. 


:  j 


3    J       a.£A(.n-l,j)        :  3 


=   max    h 


q  .  tt  .    +    (1-q.TT.)  max  (1-q.TT.)       TrP.a. 

3    3  4:    3       a    €A(n.ifj,         43    3  3D 


max 

j«c. 

a.€A(n-l,  j) 


{q  .  tt  .    +    ttP  .a  .  } 
MD    J  3    3 


max         Tra 
a£ A(n, i) 


(5) 


where   A(n,i)  =  {a<ER  |a  =  E,  .q.  +  Pjaj?  J€C±  and  a . €A(n-l, j) } .   (6) 


So   V  (it,!)   is  of  the  proper  form  and  the  proof  is  complete. 

For  any  finite   n   and   i€C  ,  A(n,i)   is  a  finite  set. 
However,  using  (6)  to  generate   A(n,i)   and  assuming  (for 
illustration  purposes  only)  that  the  number  of  elements  in   C. 
is   M   for  all   i£C  ,  the  number  of  vectors  in   A(n,i)   is   M 
times  the  number  in   A(n-l/i)  .   Since  there  are   M   vectors  in 
A(l,i)  ,  there  are  apparently   M    vectors  in   A(n,i)  .   This 
equals  the  number  of  possible  search  paths  for  the  n-time  period 
problem  that  begin  with  cell   i   and  suggests  that  total  enu- 
meration of  search  paths  might  be  as  effective  as  this  procedure. 

Fortunately,  this  is  not  necessarily  the  case.   Following 
Smallwood  and  Sondik  [19  73],  we  note  that  some  of  the  vectors  in 
A(n,i)   can  be  removed  and  the  maximization  of  (5)  left  unchanged. 
We  say  that   a€A(n,i)   is  dominated  if  for  every   tt €  IT  , 

max  Tra   =  max  TTa  .  (7) 

atA(n,i)    atA(n,i) 
a  t    a 

Dominated  vectors  can  be  removed  from   A(n,i)   and  need  not  be 
used  in  the  construction  of   A(n+l,j)  . 

Sondik  [1971]  first  provided  a  linear  programming  technique 
to  identify  dominated  vectors  for  the  POMDP .   Following  a  slight 
modification  in  Monahan  [19  82],  we  solve  the  following  linear 
program  to  check   a€A(n,i)   for  dominance: 


min  x    -    Tra  (8) 

TT  ,X 

A 

s.t.  x    >_   Tra    ,    Va€A(n,i)       but      a    t   a 

7r€n 


Whenever  the  minimal  value  of  x  -  Tia  is  non-negative,  a  is 
dominated  and  can  be  removed  from  A(n)  .  The  linear  program- 
ming solution  technique  need  not  necessarily  continue  to  opti- 

A 

mality.   As  soon  as  the  objective  function  becomes  negative,   a 
is  determined  to  be  not  dominated.   (This  method  is  similar  to 
the  branch-and-bound  technique  of  Stewart  [19  80]  in  that  both 
are  enumerative  procedures  to  systematically  eliminate  search 
paths  which  can  not  be  optimal.) 

Once  the  reduced  vector  sets   A(n,i)   have  been  generated 
for  all   i£C   and   n  =  (0,...,T)  ,  the  maximum  probability  of 
detection  and  the  optimal  T-time  period  search  plan  can  be  de- 
termined for  any  initial  target  distribution   tt  .   Assume  that 
before  the  search  begins,  the  searcher  is  in  cell  i  ,  and  thus 

the  initial  search  cell  must  be  in   C.  .   Then  the  maximum 

1 

obtainable  T-time  period  probability  of  detection  is 

max    TTa  (9) 

a€A(T,i) 

(If  the  searcher's  starting  cell,  i  ,  can  be  any  element  in   C  , 
(9)  is  maximized  over  all   i£C   to  find  the  maximum  probability 
of  detection.)   The  cell  searched  in  time  period   T   is  that 
j€C.   used  in  (6)  to  construct  the  argmax  of  (9).   If  cell   j 
is  searched  in  time  period   T   and  the  target  is  not  detected, 


then  (9)  is  resolved  for  time  period   T-l   with   T  .  (tt)   replacing 
■n      and   A(n-l,j)   replacing   A(n,i). 

Alternatively  (and  perhaps  more  simply) ,  one  can  note  that 
each   a€A(T,i)   has  associated  with  it,  not  just  the  cell  searchec 
in  time  period   T  ,  but  a  series  of   T   cells,  built  up  by  the 
sequential  application  of  (6).   When  a  particular   a£A(T,i) 
maximizes  (9) ,  the  sequence  of  cells  associated  with  the  vector 
a   is  the  optimal  search  path. 


4.   The  Dual  Definition  of  Dominance  and  a  Geometric  Interpretation 
The  linear  programming  dual  of  (8)  is 

max     v  (10) 

k 

s.t.      )   A . a .  -  v  >  a 
i  =  l   X  1 

A  1  =  1 

A  >  0 

where   i  =  (l,...,k)   indexes  all  vectors  in   A(n,i)   except   a  . 
The  duality  theorem  of  linear  programming  (Dantzig  [1963], 
p.  125  or  Luenburger  [1973],  p.  72)  states  that  the  primal  has 
a  finite  optimal  solution   iff   the  dual  has  a  finite  optimal 
solution;     and  when  feasible  optimal  solutions  exist,  the  two 
optimal  objective  functions  are  equal. 

We  know  that     a6A(n,i)   is  dominated  when  the  minimal 
value  of  the  objective  function  of  (8)  is  non- negative.   In  this 
case,  the  duality  theorem  requires  that  (10)  is  feasible  and 
that  the  optimal  value  of   v   is  also  non-negative.   Thus,  from 
the  constraints  of  (10) ,  there  exists  a  linear  combination  of 
elements  in   A(n,i)   except   a   which  (in  a  vector  sense)  is 

A 

greater  than  or  equal  to   a  .   And  the  strength  of  the  duality 
theorem  allows    the  implication  to  hold  in  the  other  direction  as 
well.   That  is,  if  such  a  linear  combination  of  vectors  in 
A(n,i)   exists,  then   a   is  dominated. 

The  dual  characterization  of  dominance  allows  a  simple 
geometric  interpretation.   If   B   is  the  convex  hull  of 


vectors  in   A(n,i)   except   a  ,  then   a   is  dominated  iff 
3   b  €  B   such  that   b  >  a  . 
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5.   Alternative  Solution  Techniques 

The  POMDP  solution  procedure  described  above  requires 

extensive  calcualtions .   To  reduce   A(n,i)   to  its  minimal  size, 

each   a£A(n,i)   must  be  checked  for  dominance  by  solving  a 

potentially  large  linear  program.   The  question  naturally  arises 

as  to  whether  a  simpler  or  more  quickly  executed  procedure  could 

be  found,  even  if  such  a  procedure  did  not  necessarily  reduce 

A(n,i)  to  its  minimal  size. 

What  is  possibly  the  simplest  such  reduction  scheme  is  to 

compare  each   a.   and   a,    (a .  t    a,  )   in   A(n,i)  ,  and  to  dis- 

3  K  j  K 

card   a  .   if   a  .  <  a,   or   a,   if   a,  <  a .  .   The  vectors  re- 
j  j  -  k       k       k-j 

maining  can  then  be  further  reduced  using  linear  programming 
methods,  or  the  larger-than-minimal   A(n,i)   can  be  used  directly 
to  construct   A(n+l,j)   by   (6).   Both  of  these  procedures  were 
coded  for  the  IBM  3033  at  the  Naval  Postgraduate  School,  and, 
for  the  search  problems  examined,  the  latter  method,  using  no 
linear  programming  at  all,  generated  optimal  solutions  more 
quickly  and  required  less  computer  storage.   Both  methods  appeared 
preferable  to  using  only  linear  programming  methods  to  check 
for  dominance. 


11 


6.   An  Example  Problem 

A  simple  5-cell  search  problem  is  described  by  the  following 
parameters. 


P  = 


0 


0 


.75 
.25 


.25 

.5 

.25 


.25 

.5 

.25 


.25 
.75 


C  =  {1,2,3,4,5} 

C   =  2 

1  A 

C2    =  {2,3} 

C3    =  {2,3,4} 

C4    =  {3,4,5} 

C5    =  {4,5} 

q.     =  q, Vi 


searcher's    starting   cell:       1 
T    =    7 
tt  (7)     =    (0,0,0,0,1) 
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The  target  starts  in  cell  5  and  the  searcher  in  cell  1.   Since 
C,  =  2  ,  the  initial  cell  searched  is  2  .   After  the  initial 
search,  cell  1  is  inaccessible  to  both  the  searcher  and  the 
target. 

The  optimal  search  path  and  the  maximum  obtainable  prob- 
ability of  detection  (P^)  are  given  in  Table  1  for   q   of 
.2,  .4,  .6,  .8,  and  1  .   Using  the  simplest  reduction  method 
(i.e.,  no  linear  programming),  the  number  of  vectors  in  A(7,l) 
increased  from  3  for   q  =  1   to  187  for   q  =  .2  .   The  CPU  time 
required  to  obtain  the  optimal  solution  increased  from  24  seconds 
for   q  =  1   to  536  seconds  for   q  =  .2  . 


CPU 

q 

optimal 

search 

path 

Pd 

#  vectors 

time 

in  A(7fl) 

(sec) 

.2 

2   3   4 

5   5 

5   5 

.357 

187 

536 

.4 

2   3   4 

5   5 

4   5 

.594 

89 

280 

.6 

2   3   4 

5   4 

5   4 

.757 

49 

179 

.8 

2   3   4 

5   4 

5   4 

.867 

26 

169 

1.0 

2   3   4 

5   4 

3   2 

.934 

3 

24 

Table  1.   Example  Problem  Results 
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